AITopics | sparse regularization

Collaborating Authors

sparse regularization

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Smooth Bilevel Programming for Sparse Regularization

Neural Information Processing SystemsDec-23-2025, 18:13:56 GMT

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple re-parametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are ridable'', so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient. We perform a numerical benchmark of the convergence speed of our algorithm against state of the art solvers for Lasso, group Lasso, trace norm and linearly constrained problems. These results highlight the versatility of our approach, removing the need to use different solvers depending on the specificity of the ML problem under study.

name change, smooth bilevel programming, sparse regularization, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.40)

Add feedback

Mixture of Group Experts for Learning Invariant Representations

Kang, Lei, Li, Jia, Tian, Mi, Huang, Hua

arXiv.org Artificial IntelligenceJul-11-2025

Sparsely activated Mixture-of-Experts (MoE) models effectively increase the number of parameters while maintaining consistent computational costs per token. However, vanilla MoE models often suffer from limited diversity and specialization among experts, constraining their performance and scalability, especially as the number of experts increases. In this paper, we present a novel perspective on vanilla MoE with top-$k$ routing inspired by sparse representation. This allows us to bridge established theoretical insights from sparse representation into MoE models. Building on this foundation, we propose a group sparse regularization approach for the input of top-$k$ routing, termed Mixture of Group Experts (MoGE). MoGE indirectly regularizes experts by imposing structural constraints on the routing inputs, while preserving the original MoE architecture. Furthermore, we organize the routing input into a 2D topographic map, spatially grouping neighboring elements. This structure enables MoGE to capture representations invariant to minor transformations, thereby significantly enhancing expert diversity and specialization. Comprehensive evaluations across various Transformer models for image classification and language modeling tasks demonstrate that MoGE substantially outperforms its MoE counterpart, with minimal additional memory and computation overhead. Our approach provides a simple yet effective solution to scale the number of experts and reduce redundancy among them. The source code is included in the supplementary material and will be publicly released.

artificial intelligence, machine learning, regularization, (18 more...)

arXiv.org Artificial Intelligence

2504.09265

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.88)

Add feedback

Sparse-Reg: Improving Sample Complexity in Offline Reinforcement Learning using Sparsity

Arnob, Samin Yeasar, Fujimoto, Scott, Precup, Doina

arXiv.org Artificial IntelligenceJun-30-2025

In this paper, we investigate the use of small datasets in the context of offline reinforcement learning (RL). While many common offline RL benchmarks employ datasets with over a million data points, many offline RL applications rely on considerably smaller datasets. We show that offline RL algorithms can overfit on small datasets, resulting in poor performance. To address this challenge, we introduce "Sparse-Reg": a regularization technique based on sparsity to mitigate overfitting in offline reinforcement learning, enabling effective learning in limited data settings and outperforming state-of-the-art baselines in continuous control.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2506.17155

Country: North America > Canada > Quebec > Montreal (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Smooth Bilevel Programming for Sparse Regularization

Neural Information Processing SystemsOct-9-2024, 11:55:36 GMT

Iteratively reweighted least square (IRLS) is a popular approach to solve sparsity-enforcing regression problems in machine learning. State of the art approaches are more efficient but typically rely on specific coordinate pruning schemes. In this work, we show how a surprisingly simple re-parametrization of IRLS, coupled with a bilevel resolution (instead of an alternating scheme) is able to achieve top performances on a wide range of sparsity (such as Lasso, group Lasso and trace norm regularizations), regularization strength (including hard constraints), and design matrices (ranging from correlated designs to differential operators). Similarly to IRLS, our method only involves linear systems resolutions, but in sharp contrast, corresponds to the minimization of a smooth function. Despite being non-convex, we show that there is no spurious minima and that saddle points are "ridable'', so that there always exists a descent direction. We thus advocate for the use of a BFGS quasi-Newton solver, which makes our approach simple, robust and efficient.

artificial intelligence, machine learning, smooth bilevel programming, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.42)

Add feedback

Enhancing One-shot Pruned Pre-trained Language Models through Sparse-Dense-Sparse Mechanism

Li, Guanchen, Zhao, Xiandong, Liu, Lian, Li, Zeping, Li, Dong, Tian, Lu, He, Jie, Sirasao, Ashish, Barsoum, Emad

arXiv.org Artificial IntelligenceAug-19-2024

Pre-trained language models (PLMs) are engineered to be robust in contextual understanding and exhibit outstanding performance in various natural language processing tasks. However, their considerable size incurs significant computational and storage costs. Modern pruning strategies employ one-shot techniques to compress PLMs without the need for retraining on task-specific or otherwise general data; however, these approaches often lead to an indispensable reduction in performance. In this paper, we propose SDS, a Sparse-Dense-Sparse pruning framework to enhance the performance of the pruned PLMs from a weight distribution optimization perspective. We outline the pruning process in three steps. Initially, we prune less critical connections in the model using conventional one-shot pruning methods. Next, we reconstruct a dense model featuring a pruning-friendly weight distribution by reactivating pruned connections with sparse regularization. Finally, we perform a second pruning round, yielding a superior pruned model compared to the initial pruning. Experimental results demonstrate that SDS outperforms the state-of-the-art pruning techniques SparseGPT and Wanda under an identical sparsity configuration. For instance, SDS reduces perplexity by 9.13 on Raw-Wikitext2 and improves accuracy by an average of 2.05% across multiple zero-shot benchmarks for OPT-125M with 2:4 sparsity.

language model, pruning, regularization, (14 more...)

arXiv.org Artificial Intelligence

2408.10473

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Austria > Vienna (0.14)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
(12 more...)

Genre:

Workflow (1.00)
Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Multi-stage Convex Relaxation for Learning with Sparse Regularization

Neural Information Processing SystemsFeb-16-2024, 12:21:20 GMT

We study learning formulations with non-convex regularizaton that are natural for sparse linear models. There are two approaches to this problem: (1) Heuristic methods such as gradient descent that only find a local minimum. A drawback of this approach is the lack of theoretical guarantee showing that the local minimum gives a good solution. However it often leads to sub-optimal sparsity in reality. This paper tries to remedy the above gap between theory and practice.

learning, multi-stage convex relaxation, sparse regularization, (1 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.85)

Add feedback

DS-TDNN: Dual-stream Time-delay Neural Network with Global-aware Filter for Speaker Verification

Li, Yangfu, Gan, Jiapan, Lin, Xiaodan

arXiv.org Artificial IntelligenceAug-1-2023

Conventional time-delay neural networks (TDNNs) struggle to handle long-range context, their ability to represent speaker information is therefore limited in long utterances. Existing solutions either depend on increasing model complexity or try to balance between local features and global context to address this issue. To effectively leverage the long-term dependencies of audio signals and constrain model complexity, we introduce a novel module called Global-aware Filter layer (GF layer) in this work, which employs a set of learnable transform-domain filters between a 1D discrete Fourier transform and its inverse transform to capture global context. Additionally, we develop a dynamic filtering strategy and a sparse regularization method to enhance the performance of the GF layer and prevent overfitting. Based on the GF layer, we present a dual-stream TDNN architecture called DS-TDNN for automatic speaker verification (ASV), which utilizes two unique branches to extract both local and global features in parallel and employs an efficient strategy to fuse different-scale information. Experiments on the Voxceleb and SITW databases demonstrate that the DS-TDNN achieves a relative improvement of 10\% together with a relative decline of 20\% in computational cost over the ECAPA-TDNN in speaker verification task. This improvement will become more evident as the utterance's duration grows. Furthermore, the DS-TDNN also beats popular deep residual models and attention-based systems on utterances of arbitrary length.

artificial intelligence, global context, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2303.1102

Country: Asia > China > Fujian Province > Xiamen (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Speech > Acoustic Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Sparse then Prune: Toward Efficient Vision Transformers

Prasetyo, Yogi, Yudistira, Novanto, Widodo, Agus Wahyu

arXiv.org Artificial IntelligenceJul-22-2023

The Vision Transformer architecture is a deep learning model inspired by the success of the Transformer model in Natural Language Processing. However, the self-attention mechanism, large number of parameters, and the requirement for a substantial amount of training data still make Vision Transformers computationally burdensome. In this research, we investigate the possibility of applying Sparse Regularization to Vision Transformers and the impact of Pruning, either after Sparse Regularization or without it, on the trade-off between performance and efficiency. To accomplish this, we apply Sparse Regularization and Pruning methods to the Vision Transformer architecture for image classification tasks on the CIFAR-10, CIFAR-100, and ImageNet-100 datasets. The training process for the Vision Transformer model consists of two parts: pre-training and fine-tuning. Pre-training utilizes ImageNet21K data, followed by fine-tuning for 20 epochs. The results show that when testing with CIFAR-100 and ImageNet-100 data, models with Sparse Regularization can increase accuracy by 0.12%. Furthermore, applying pruning to models with Sparse Regularization yields even better results. Specifically, it increases the average accuracy by 0.568% on CIFAR-10 data, 1.764% on CIFAR-100, and 0.256% on ImageNet-100 data compared to pruning models without Sparse Regularization. Code can be accesed here: https://github.com/yogiprsty/Sparse-ViT

artificial intelligence, machine learning, natural language, (14 more...)

arXiv.org Artificial Intelligence

2307.11988

Country: Asia > Indonesia (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Robustifying DARTS by Eliminating Information Bypass Leakage via Explicit Sparse Regularization

Zhang, Jiuling, Ding, Zhiming

arXiv.org Artificial IntelligenceJun-12-2023

Differentiable architecture search (DARTS) is a promising end to end NAS method which directly optimizes the architecture parameters through general gradient descent. However, DARTS is brittle to the catastrophic failure incurred by the skip connection in the search space. Recent studies also cast doubt on the basic underlying hypotheses of DARTS which are argued to be inherently prone to the performance discrepancy between the continuous-relaxed supernet in the training phase and the discretized finalnet in the evaluation phase. We figure out that the robustness problem and the skepticism can both be explained by the information bypass leakage during the training of the supernet. This naturally highlights the vital role of the sparsity of architecture parameters in the training phase which has not been well developed in the past. We thus propose a novel sparse-regularized approximation and an efficient mixed-sparsity training scheme to robustify DARTS by eliminating the information bypass leakage. We subsequently conduct extensive experiments on multiple search spaces to demonstrate the effectiveness of our method.

architecture search, darts, softmax, (13 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICDM51629.2021.00099

2306.06858

Country: Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

Add feedback

Sparse Deep Neural Network for Nonlinear Partial Differential Equations

Xu, Yuesheng, Zeng, Taishan

arXiv.org Artificial IntelligenceJul-26-2022

More competent learning models are demanded for data processing due to increasingly greater amounts of data available in applications. Data that we encounter often have certain embedded sparsity structures. That is, if they are represented in an appropriate basis, their energies can concentrate on a small number of basis functions. This paper is devoted to a numerical study of adaptive approximation of solutions of nonlinear partial differential equations whose solutions may have singularities, by deep neural networks (DNNs) with a sparse regularization with multiple parameters. Noting that DNNs have an intrinsic multi-scale structure which is favorable for adaptive representation of functions, by employing a penalty with multiple parameters, we develop DNNs with a multi-scale sparse regularization (SDNN) for effectively representing functions having certain singularities. We then apply the proposed SDNN to numerical solutions of the Burgers equation and the Schr\"odinger equation. Numerical examples confirm that solutions generated by the proposed SDNN are sparse and accurate.

equation, neural network, sdnn model, (15 more...)

arXiv.org Artificial Intelligence

2207.13266

Country:

Asia > China > Guangdong Province > Guangzhou (0.04)
North America > United States > Virginia > Norfolk City County > Norfolk (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback